Hands-on Exercise 3: Processing and Visualising Flow Data

Author

Muhamad Ameer Noor

Published

December 1, 2023

Modified

December 2, 2023

1 Overview

Spatial interaction encompasses the dynamics of movement including people, goods, or information between locations in geographical space. This broad concept includes diverse activities such as global trade, transportation schedules, and even pedestrian movements.

Each spatial interaction involves a discrete origin/destination pair, represented as a cell in a matrix. The matrix, known as an origin/destination matrix or spatial interaction matrix, has rows corresponding to origin locations and columns to destination locations.

In this analysis, we’ll construct an OD matrix using the Passenger Volume by Origin Destination Bus Stops dataset obtained from LTA DataMall.

2 Getting Started

We’ll employ four R packages for this analysis:

  • sf: For handling and transforming geospatial data.
  • tidyverse: For data manipulation and visualization.
  • tmap: For creating thematic maps.
Code
pacman::p_load(tmap, sf, DT, stplanr, performance,
               ggpubr, tidyverse)
#pacman::p_load' ensures all specified packages are installed and loaded

3 Preparing the Flow Data

Importing the OD Data

We begin by importing the Passenger Volume by Origin Destination Bus Stops dataset using the read_csv() function from the readr package.

Code
odbus202308 <- read_csv("../data/aspatial/origin_destination_bus_202308.csv.gz")
odbus202308 <- data.frame(lapply(odbus202308, factor))
odbus202308$TOTAL_TRIPS <- as.numeric(odbus202308$TOTAL_TRIPS)
odbus202308$TIME_PER_HOUR <- as.numeric(odbus202308$TIME_PER_HOUR)
# The dataset is converted to a dataframe with appropriate data types

# display the summary of the dataset
glimpse(odbus202308)
Rows: 5,709,512
Columns: 7
$ YEAR_MONTH          <fct> 2023-08, 2023-08, 2023-08, 2023-08, 2023-08, 2023-…
$ DAY_TYPE            <fct> WEEKDAY, WEEKENDS/HOLIDAY, WEEKENDS/HOLIDAY, WEEKD…
$ TIME_PER_HOUR       <dbl> 17, 17, 15, 15, 18, 18, 18, 18, 8, 18, 15, 11, 11,…
$ PT_TYPE             <fct> BUS, BUS, BUS, BUS, BUS, BUS, BUS, BUS, BUS, BUS, …
$ ORIGIN_PT_CODE      <fct> 04168, 04168, 80119, 80119, 44069, 44069, 20281, 2…
$ DESTINATION_PT_CODE <fct> 10051, 10051, 90079, 90079, 17229, 17229, 20141, 2…
$ TOTAL_TRIPS         <dbl> 7, 2, 3, 10, 5, 4, 3, 22, 3, 3, 7, 1, 3, 1, 3, 1, …

Extracting the Study Data

Our focus is on weekday commuting flows between 6 and 9 AM.

Code
odbus6_9 <- odbus202308 %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE, DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
# Filtering and aggregating data for specific time and day

# View the extracted data
datatable(odbus6_9)
Code
# Displaying the data in an interactive table format

4 Working with Geospatial Data

We’ll use two geospatial datasets:

  • BusStop: Locations of bus stops from LTA DataMall as of July 2023.
  • MPSZ-2019: URA Master Plan 2019 sub-zone boundaries.

Importing Geospatial Data

The datasets will be imported as follows:

Code
busstop <- st_read(dsn = "../data/geospatial", layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source `C:\ameernoor\ISSS624\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
Code
# Importing and transforming the BusStop data

mpsz <- st_read(dsn = "../data/geospatial", layer = "MPSZ-2019") %>%
  st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source `C:\ameernoor\ISSS624\data\geospatial' using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
Code
# Importing and transforming the MPSZ-2019 data
mpsz
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
                 SUBZONE_N SUBZONE_C       PLN_AREA_N PLN_AREA_C       REGION_N
1              MARINA EAST    MESZ01      MARINA EAST         ME CENTRAL REGION
2         INSTITUTION HILL    RVSZ05     RIVER VALLEY         RV CENTRAL REGION
3           ROBERTSON QUAY    SRSZ01  SINGAPORE RIVER         SR CENTRAL REGION
4  JURONG ISLAND AND BUKOM    WISZ01  WESTERN ISLANDS         WI    WEST REGION
5             FORT CANNING    MUSZ02           MUSEUM         MU CENTRAL REGION
6         MARINA EAST (MP)    MPSZ05    MARINE PARADE         MP CENTRAL REGION
7                   SUDONG    WISZ03  WESTERN ISLANDS         WI    WEST REGION
8                  SEMAKAU    WISZ02  WESTERN ISLANDS         WI    WEST REGION
9           SOUTHERN GROUP    SISZ02 SOUTHERN ISLANDS         SI CENTRAL REGION
10                 SENTOSA    SISZ01 SOUTHERN ISLANDS         SI CENTRAL REGION
   REGION_C                       geometry
1        CR MULTIPOLYGON (((33222.98 29...
2        CR MULTIPOLYGON (((28481.45 30...
3        CR MULTIPOLYGON (((28087.34 30...
4        WR MULTIPOLYGON (((14557.7 304...
5        CR MULTIPOLYGON (((29542.53 31...
6        CR MULTIPOLYGON (((35279.55 30...
7        WR MULTIPOLYGON (((15772.59 21...
8        WR MULTIPOLYGON (((19843.41 21...
9        CR MULTIPOLYGON (((30870.53 22...
10       CR MULTIPOLYGON (((26879.04 26...
Note
  • st_read() imports shapefiles into R as sf data frames.
  • st_transform() alters the projection to CRS 3414 for uniformity.

5 Geospatial Data Wrangling

Combining Busstop and mpsz

We’ll merge the planning subzone codes from the mpsz dataset into the busstop dataset.

Code
busstop_mpsz <- st_intersection(busstop, mpsz) %>%
  select(BUS_STOP_N, SUBZONE_C) %>%
  st_drop_geometry()
# Merging data and retaining essential columns

# Note: Five bus stops outside Singapore's boundary are excluded.

# check the output
datatable(busstop_mpsz)
Code
# Viewing the combined data

Now, let’s append the planning subzone codes to the odbus6_9 dataset:

Code
od_data <- left_join(odbus6_9, busstop_mpsz, by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE, ORIGIN_SZ = SUBZONE_C, DESTIN_BS = DESTINATION_PT_CODE)
# Joining and renaming columns for clarity

# check the data
glimpse(od_data)
Rows: 227,177
Columns: 4
Groups: ORIGIN_BS [5,005]
$ ORIGIN_BS <chr> "01012", "01012", "01012", "01012", "01012", "01012", "01012…
$ DESTIN_BS <fct> 01112, 01113, 01121, 01211, 01311, 07371, 60011, 60021, 6003…
$ TRIPS     <dbl> 177, 111, 40, 87, 184, 18, 22, 16, 12, 21, 2, 3, 1, 1, 1, 1,…
$ ORIGIN_SZ <chr> "RCSZ10", "RCSZ10", "RCSZ10", "RCSZ10", "RCSZ10", "RCSZ10", …

Checking for duplicate records is crucial, if duplicates exist, we keep only unique records:

Code
duplicate <- od_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
# Identifying any duplicate records

od_data <- unique(od_data)
# Retaining only unique records

glimpse(od_data)
Rows: 226,610
Columns: 4
Groups: ORIGIN_BS [5,005]
$ ORIGIN_BS <chr> "01012", "01012", "01012", "01012", "01012", "01012", "01012…
$ DESTIN_BS <fct> 01112, 01113, 01121, 01211, 01311, 07371, 60011, 60021, 6003…
$ TRIPS     <dbl> 177, 111, 40, 87, 184, 18, 22, 16, 12, 21, 2, 3, 1, 1, 1, 1,…
$ ORIGIN_SZ <chr> "RCSZ10", "RCSZ10", "RCSZ10", "RCSZ10", "RCSZ10", "RCSZ10", …
Code
# check the data

We’ll now complete the dataset with destination subzone codes:

Code
od_data <- left_join(od_data, busstop_mpsz, by = c("DESTIN_BS" = "BUS_STOP_N")) %>%
   rename(DESTIN_SZ = SUBZONE_C)
# Further enriching the dataset with destination subzone codes

glimpse(od_data)
Rows: 227,523
Columns: 5
Groups: ORIGIN_BS [5,005]
$ ORIGIN_BS <chr> "01012", "01012", "01012", "01012", "01012", "01012", "01012…
$ DESTIN_BS <chr> "01112", "01113", "01121", "01211", "01311", "07371", "60011…
$ TRIPS     <dbl> 177, 111, 40, 87, 184, 18, 22, 16, 12, 21, 2, 3, 1, 1, 1, 1,…
$ ORIGIN_SZ <chr> "RCSZ10", "RCSZ10", "RCSZ10", "RCSZ10", "RCSZ10", "RCSZ10", …
$ DESTIN_SZ <chr> "RCSZ10", "DTSZ01", "RCSZ04", "KLSZ09", "KLSZ06", "KLSZ06", …
Code
# check the data

Checking for and removing any remaining duplicates:

Code
duplicate <- od_data %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
# Re-checking for duplicates

od_data <- unique(od_data)
# Ensuring all records are unique

glimpse(od_data)
Rows: 226,846
Columns: 5
Groups: ORIGIN_BS [5,005]
$ ORIGIN_BS <chr> "01012", "01012", "01012", "01012", "01012", "01012", "01012…
$ DESTIN_BS <chr> "01112", "01113", "01121", "01211", "01311", "07371", "60011…
$ TRIPS     <dbl> 177, 111, 40, 87, 184, 18, 22, 16, 12, 21, 2, 3, 1, 1, 1, 1,…
$ ORIGIN_SZ <chr> "RCSZ10", "RCSZ10", "RCSZ10", "RCSZ10", "RCSZ10", "RCSZ10", …
$ DESTIN_SZ <chr> "RCSZ10", "DTSZ01", "RCSZ04", "KLSZ09", "KLSZ06", "KLSZ06", …
Code
# Check the data

Finally, we’ll prepare the data for visualisation:

Code
od_data <- od_data %>%
  drop_na() %>%
  group_by(ORIGIN_SZ, DESTIN_SZ) %>%
  summarise(MORNING_PEAK = sum(TRIPS))
# Final data preparation step

glimpse(od_data)
Rows: 20,600
Columns: 3
Groups: ORIGIN_SZ [310]
$ ORIGIN_SZ    <chr> "AMSZ01", "AMSZ01", "AMSZ01", "AMSZ01", "AMSZ01", "AMSZ01…
$ DESTIN_SZ    <chr> "AMSZ01", "AMSZ02", "AMSZ03", "AMSZ04", "AMSZ05", "AMSZ06…
$ MORNING_PEAK <dbl> 1866, 8726, 12598, 2098, 7718, 1631, 1308, 2261, 1526, 14…
Code
# check the final data

6 Visualising Spatial Interaction

We’ll now create and visualize desire lines using the stplanr package.

Removing Intra-zonal Flows

Intra-zonal flows are not required for our analysis:

Code
od_data1 <- od_data[od_data$ORIGIN_SZ != od_data$DESTIN_SZ,]
# Removing intra-zonal flows for clarity in visualization

glimpse(od_data1)
Rows: 20,309
Columns: 3
Groups: ORIGIN_SZ [310]
$ ORIGIN_SZ    <chr> "AMSZ01", "AMSZ01", "AMSZ01", "AMSZ01", "AMSZ01", "AMSZ01…
$ DESTIN_SZ    <chr> "AMSZ02", "AMSZ03", "AMSZ04", "AMSZ05", "AMSZ06", "AMSZ07…
$ MORNING_PEAK <dbl> 8726, 12598, 2098, 7718, 1631, 1308, 2261, 1526, 141, 655…
Code
# check the output

Creating Desire Lines

Here’s how to create desire lines using the od2line() function:

Code
flowLine <- od2line(flow = od_data1, zones = mpsz, zone_code = "SUBZONE_C")
# Generating desire lines between different zones

glimpse(flowLine)
Rows: 20,309
Columns: 4
$ ORIGIN_SZ    <chr> "AMSZ01", "AMSZ01", "AMSZ01", "AMSZ01", "AMSZ01", "AMSZ01…
$ DESTIN_SZ    <chr> "AMSZ02", "AMSZ03", "AMSZ04", "AMSZ05", "AMSZ06", "AMSZ07…
$ MORNING_PEAK <dbl> 8726, 12598, 2098, 7718, 1631, 1308, 2261, 1526, 141, 655…
$ geometry     <LINESTRING [m]> LINESTRING (29501.77 39419...., LINESTRING (29…
Code
# check the output

Visualising the Desire Lines

To visualize the lines:

Code
# Enable tmap to automatically check and fix invalid polygons
tmap_options(check.and.fix = TRUE)

# Now create your plot
tm_shape(mpsz) +
  tm_polygons() +
  flowLine %>%
  tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3)

Code
# Thematic map showing the intensity of commuting flows

Focusing on selected flows can be insightful, especially when data are skewed:

Code
tmap_mode("view")
tmap_options(check.and.fix = TRUE)
Code
flowLine %>%  
  filter(MORNING_PEAK >= 5000) %>%
  tm_shape() +
  tm_lines(lwd = "MORNING_PEAK", col = "orange", style = "quantile", scale = c(0.1, 1, 3, 5, 7, 10), n = 6, alpha = 0.3)
Code
# Filtering and visualizing only significant flows